649563^ 


BOEING  SCIENTIFIC  RESEARCH  LABORATORIES  DOCUMENT  D1-B2-0599 


Maximum  Likelihood  Estimation:  A  Practical  Theorem 
on  Consistency  of  the  Nonparametric  Maximum  Likelihood  Estimates 


with  Applications 


Gordon  B.  Crawford 


MATHEMATICS  RESEARCH  FEBRUARY  1SS7 


Dl-82-0599 


Cf 

/!* 


KrtXIMUM  LIKELIHOOD  ESTIMATION:  A  PRACTICAL  THEOREM 
ON  CONSISTENCY  OF  THE  NONPARAMETRIC  MAXIfOJM  LIKELIHOOD  ESTIMATES 
WITH  APPLICATIONS 


by 


Gordon  B.  Crawford 


Mathematical  Note  No.  503 
Mathematics  Research  Laboratory 
BOEING  SCIENTIFIC  RESEARCH  LABORATORIES 


February  1967 


Summary 

Sufficient  conditions  for  consistency  of  a 
nonparametric  maximum  likelihood  estimate  are  given 
which  are  applicable  to  those  problems  where  a  class 
of  distribution  functions  is  specified  only  in  terms 
of  its  graphs. 

Consistency  is  proven  and  applications  are 
given. 


Introduction 


In  many  statistical  analyses  there  may  be  prior  considerations 
which  give  information  about  the  shape  of  a  distribution  function 
(c.f.  examples  1-3)  whereas  there  may  not  be  sufficient  information 
to  consign  the  distribution  to  some  class  having  a  finite  dimensional 
parameterization. 

Given  a  class  of  densities  and  a  sample  of  independent 
identically  distributed  random  variables  we  may  pick  the  most  likely 
density  without  any  reference  to  a  parameterization.  We  intend  to 
give  conditions  on  the  class  in  question  which  will  insure  that  the 
corresponding  distribution  function  converges  to  the  true  distribution 
function  at  points  of  continuity  of  the  latter. 

In  addition  to  being  a  more  general  consideration,  consistency 
of  the  estimate  of  the  distribution  function  is  in  many  ways  a  more 
natural  problem  than  consistency  of  the  estimates  of  parameters. 

In  the  paper  [7]  by  J.  Kiefer  and  J.  Wolfowitz  there  is  a  far- 
reaching  paragraph  on  page  893  which  indicates  that  their  results  can 
be  extended  to  a  general  case  of  nonparametric  maximum  likelihood 
estimation  of  distribution  functions. 

In  most  cases  of  nonparametric  maximum  likelihood  estimation 
that  have  come  to  our  attention,  (fo.  instance  [8]),  application  of 
[7]  or  [3]  is  hindered  by  the  fact  that  justification  of  the  hypothesis 
is  more  difficult  than  a  direct  proof  of  consistency  based  on  the  form 


of  the  estimates. 


Conditions  which  seem  to  be  amenable  to  a  wide  class  of 


nonparametric  estimation  problems  are  given  which  are  sufficient 
for  consistency  of  the  M.L.E. 

In  view  of  the  examples,  it  is  felt  that  these  conditions  may 
be  more  easily  investigated  and  satisfied  than  perhaps  even  Wald's 
hypothesis  [10]  for  the  classical  parametric  estimation  problem. 

Under  our  hypothesis  the  correspondence  between  the  class  of 
densities  from  which  we  pick  the  most  likely  and  the  corresponding 
class  of  distribution  functions  may  be  many  to  one.  By  avoiding 
the  requirement  that  this  relation  be  one  to  one  we  have  included 
(in  example  2)  consistency  in  the  class  of  distribution  functions 
having  unimodal  densities. 

Since  the  M.L.E.  in  this  class  is,  at  some  observation,  neither 
right  nor  left  continuous,  there  will  almost  surely  be  no  M.L.E.  if 
we  consider  only  a  class  of  densities  which  are  in  one-to-one 
correspondence  with  their  distribution  functions.  In  this  context, 
our  result  is  slightly  more  general  than  [7]  but  in  that  paper  Kiefer 
and  Wolfowitz  do  not  hypothesize  that  the  distribution  functions  must 
be  absolutely  continuous  with  respect  to  some  fixed  underlying  sigma- 
finite  measure  as  is  hypothesized  here.  (Condition  1) 

Notation 

In  the  sequel,  a  distribution  function  F(*)  is  a  monotone  non¬ 
decreasing  function  with  range  in  [0,1]. 

A  proper  distribution  function  F(*)  is  a  right  continuous 
distribution  function  with  range  1. 
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Given  a  distribution  function  F(*)»  there  is,  of  course,  the 
corresponding  measure  F{‘}  on  R  with  the  property  that 
F{x  :  a  <  x  <_  b}  *  F(b)  -  F(a)  whenever  a,b  are  continuity  points 
of  F(*)«  We  make  no  real  distinction  here,  except  to  reserve  the 
notation  (•)  for  point  functions  and  {•}  for  set  functions. 

We  assume  that  we  are  given  a  class  &  of  proper  distribution 
functions,  and  we  have  the  a  priori  knowledge  that  the  distribution 
function  of  a  random  variable  X  to  be  observed  is  in  (7.  We  give  $ 
the  topology  of  weak  convergence:  a  sequence  Fn(*)>  n=l,2,... 
converges  to  F(*)  whenever  F^(x)  converges  to  F(x)  for  all  x 
in  the  continuity  set  of  F(*)< 

Let  Q  denote  a  compactif ication  of  <r  whose  elements  are 
distribution  functions,  again  we  give  6  the  topology  of  weak 
convergence . 

(If  6  contains  two  or  more  distribution  functions  which  are 
identical  except  on  the  discontinuity  set  of  one,  then  6  is  not  a 
Hausdorff  space,  but  by  identifying  all  such  distribution  functions  we 
may  form  a  quotient  space  which  is  Hausdorff,  and  in  this  quotient  space 
the  topology  of  weak  convergence  is  the  same  as  the  metric  topology  given 
by: 

p  (F,G)  =  |  F(x)-G  (x)  |dx. 

It  follows  that  the  decreasing  sequence  of  neighborhoods  used  in 
Lemma  1  and  the  sequel  always  exists.  In  the  following  it  is  tacitly 
assumed  that  when  necessary  we  have  in  mind  this  quotient  space;  that 
"distinct"  distribution  functions  do  not  differ  only  on  their  discontinuity 


N 
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sets;  and  H  &  ■  G  means  that  D  #  does  not  contain  any 
n  n  n  n 

distribution  function  which  is  distinct  from  G.) 

Condition  1.  All  elements  of  G  are  absolutely  continuous  with 
respect  to  some  fixed  o-finite  measure  u(*)  on  R. 

Let  £  be  a  set  of  densities  with  the  property  that  every  element 
of  C  is  the  density  of  an  element  of  G,  and  every  element  of  G  has 
at  least  one  density  in  £.  Our  maximum  likelihoou  estimate  will  be 
determined  by  finding  the  density  f  e  C  that  maximizes  the  likelihood 
of  a  given  sample  X^,...,Xn  **  X^.  To  facilitate  discussion  of  the 
estimate  we  assume  that  C  and  G  are  so  chosen  that  C  will  almost 
surely  contain  a  g(«)  which  maximizes  the  likelihood 

L(8-V  '  n  jj  18  *<V 

where  the  X^  are  independent  identically  distributed  random  variables 
whose  cumulative  distribution  function  is  an  element  of  9 . 

Condition  2.  There  exists  a  countable  subset  g  ,  m=l,2,...,  of 
-  m 

C  with  the  property  that  for  all  small  neighborhoods  &  of  G: 

sup  h(x)  =  sup  g  (x)  a.e.  u { * } . 

h:He#  g  :G  m 

m  m 

Let  s(*,$)  denote  the  supremum  function: 

s(x,$)  =  sup  h(x) . 
h:He^ 

Then  Condition  2  assures  the  measurability  of  s(x,#).  We  also 


require 
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Condition  3.  Given  F  in  &,  G  in  G>  there  exists  a  small 
neighborhood  #  of  G  such  that  the  function 

ig  s(x,#) 

is  bounded  above  by  a  function  which  is  integrable  with  respect  to 
F{  • }. 

Main  Theorem  and  Lemmas 

Lemma  1 .  Let  #  be  a  decreasing  sequence  of  neighborhoods  of 

G  in  Q  such  that  H#  =  G.  Then,  if  g  e  £  is  any  density  of  G, 
we  have 

(i)  Lim  E  [ig  s(x,&  )  -  ig  g(x)]  =  0. 

nr  n 

Proof.  Using  Condition  3  we  have: 

Lim  E  [ig  s(x,tf  )  -  ig  g(x)]  = 
nr  n 

E  [lim  (ig  s(x,£  )  -  ig  g (x ) )  ]  • 
r  n  n 

We  intend  to  show  that  the  function  in  round  brackets  above 
converges  in  F  measure  to  zero.  It  follows  (Halmos,  p.  91)  that  it 
is  convergent  a.e.  (with  respect  to  F)  to  the  zero  function,  since  it 
is  decreasing  in  n;  hence  the  limit  (i)  is  zero,  as  was  to  be  shown. 

To  show 


LimnF{x  :  ig  s(x,^n)  -  £g  g(x)  >_  e }  =  0, 


define  (with  reference  to  the  countable  subset  g  (•),  m=l,2,...,  of  C) , 


m 


m 


{x  :  ig  g  (x )  -  ig  g (x )  >  c/2}  if  G  (•)  c  # 
m  m  n 


otherwise. 


-6- 


\ 


Then 


Lim  F{x  :  Ig  s(x,tf  )  -  2g  g(x)  >_  d  <  Lim  F{  U  B  }  . 
n  n  11  “  n  m  m 

m  n 

But  F{*}  is  finite,  and  B  ,  n=l,2,...  is  a  decreasing  nest 

m 

n 

of  sets  with 


Lim  FU  }  =  0, 
n  m 

n 

therefore 

Lim  F{ U  B  }  -  0.  I 
n  m  mn 

Lemma  2.  If  F  e  ff  and  G  e  Q  are  distinct  elements  of  G  and 
f  and  g  are  any  two  corresponding  densities  in  £,  then 

EFUg  gOO  "  2g  f  (x)  ]  <  0. 

Proof.  Let  f  have  support  S,  then 

Ep  «,g[g(x)/f  (x)  ]  <_  £g  Epg (x) / f  (x)  -  £g  j  g(x)y(dx)  _<  0. 

Moreover,  Ep  Ig [g(x) /f (x) ]  <  £g  Epg(x)/f(x)  unless  on  S  the  real 
valued  random  variable  g(x)/f(x)  is  almost  surely  equal  to  some 
constant  c;  but  c  <_  1  since  /f(x)y(dx)  ■  1  and  /g(x)y(dx)  <_  1; 
and  c  i1  1  since  F  and  G  are  distinct;  hence  either 
Ep  Ig  g(x)/f(x)  <  Ig  Epg (x)/f (x)  or  £g  J  g(x)u(dx)  =  ig  c  <  0.  I 

Theorem  1 .  Let  G  in  G,  F  e  &  distinct,  and  let  f  t  C  be 

a  density  of  F.  If  X^^.-.-.X^  ■  X^  is  a  sample  with  distribution 

function  F,  then  there  exists  a  neighborhood  &  of  G  such  that 

Lim  sup  f  sup  L(h,X  )  -  L(f,X  )1  <  0 

n|_h:He$  ^  “"J 


with  F  probability  1. 
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Proof .  Given  G  not  equal  to  F  there  exists  (by  Lemma  1  and  2) 
a  small  neighborhood  ^  of  G  such  that 


(ii)  CFUg  s(x,&)  -  Zg  f  (x )  ]  - 

£p[Ug  s(x,#)  -  Jtg  g(x))  +  (ig  g(x)  -  lg  f(x))]  <  0. 

But 

sup  L(h,X  )  -  L(f,X  )  <  7  lUg  s(x.fl)  -  Zg  f(x  )), 
h:Hefl  ~n  n  i  i 

and  by  the  law  of  large  numbers  the  latter  quantity  converges  to  the 
negative  expression  (ii). 

Corollary  to  Theorem  1 .  Let  and  f  be  as  in  Theorem  1. 

Let  D  be  any  closed  set  not  containing  F.  Then 


Lim  sup  [  sup  L(h,X  )  -  L(f,X  )1  <  0 

"MeD  J 

with  F  probability  1. 

Proof.  By  Theorem  1,  any  G  in  D  can  be  covered  by  an  open 
neighborhood  U_  with  the  property  that 

U 

Lim  sup  f  sup  L(h,X  )  -  L(f,X  )"]  <  0. 

n[h:HeUG  _n  "n  J 

From  the  open  cover  {IL,,  G  z  D}  of  D,  let  U,  ,...,U  be  a  finite 
r  G  1  m 

subcover.  Then  with  F  piut  ibility  1, 


Lim  sup 


"[h: 


sup  L(h,X 
HeD  _l 


.)  "  L ( f , X 


;  >1  < 

i.  — 


Lim  sup 

n 


max  I”  sup 
l<^i£ml_h:HeU^ 


L(h,X  ) 


max  Lim  sup  [  sup  L(h,X  )  -  L(f,X  )1  >  0 
l<i<«n  nLh:HeUi  "T‘ J  I 


as  was  to  be  shown. 
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Let 


Theorem  2.  (Consistency  of  Maximum  Likelihood  Estimate). 

f ( * ; X  )  be  a  point  of  C  depending  on  the  random  variable 
n 

. . ,X  “X  such  that  for  all  f  in  C  and  some  fixed  c  >  0, 

n  — n 

n 


FT  f(x4;x  ) 

l 

FT  f(x4) 

i 


Then  F(*,X  )  converges  to  the  distribution  function  F(*)  of 
— n 

X^,...,Xn  with  F  probability  1. 

Proof.  For  notational  simplicity,  write  Fn(*)  =  F( » ,X^  ) , 

f  (•)  -  f(‘,X  );  if 
n  n 


FT  W 

IT  f(«.) 
1 


>  c  >  0, 


then 


(2.1) 


Lim  sup 


rr  W 

i  f(‘i> 


t  -1/n 


'  1, 


and  therefore 


Lim  sup  [L(f  ,X  )  -  L(f,X  )]  >  0. 
n  n  — n  — n  - 


Let  U  be  any  open  neighborhood  of  F.  If  F^  is  outside  of 


U  infinitely  often,  then 


Lim  sup  sup  L(h,x  )  -  L(f,x  )  >  0. 

n  [_h:HefrU  n  n. 


«*"/v  1&&  '■  *  ’• 
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By  the  corollary  to  Theorem  1,  this  can  occur  only  on  a  set  of  measure 

zero.  Now,  let  be  a  decreasing  sequence  of  neighborhoods  of  F 

whose  intersection  is  F.  Corresponding  to  each  there  is,  by  the 

corollary,  an  event  of  F  measure  zero  such  that  on  the  complement 

of  S^,  F^  is  eventually  in  lh.  It  follows  that  on  the  complement 
00 

of  U  Si>  F^  is  eventually  in  every  neighborhood  of  F.  Thus  F^ 
coverges  to  F  with  F  probability  1,  as  was  to  be  shown.  | 


Examples 

Example  1.  (c.f.  [9]).  Let  be  the  class  of  distribution 

functions  on  [0,+®>)  which  passes  a  Radon-Nykodymn  derivative  which  is 

_  _  -f-oo 

nonincreasing  and  bounded  above  by  m,  and  let  y  *  y  .  It  follows  from  the 
form  of  the  estimate  and  considerations  of  truncated  data  that  if  G  e  Q,  and 
t*  is  fixed,  m  >  g(t*),  then  consistency  of  the  estimate  in  (Jm  implies 
consistency  of  the  estimate  in  6  at  the  point  t*;  but  t*  is  arbitrary. 


We  will  show  how  to  compute  the  estimate  in  6  and 
prove  consistency  in  thus  establishing  consistency  in  Q. 

Computation  of  the  M.L.E.  in  Q .  Given  a  sample  <_***<_X  ,  it 
is  clear  that  the  M.L.E.  fn(*)  of  the  density  will  put  as  little  mass 
between  observations  as  is  consistent  with  the  hypothesis  that  fn(*) 
be  decreasing.  Thus  f^CO  will  be  a  left  continuous  step  function  with 


heights  a^. 

,  subject  to: 

(i) 

a.  >  a-  >•••>  a  , 

1  —  L  —  —  n 

n 

(ii) 

I  a^-X^)  <  1, 

(letting  XQ  *  0). 


"tti  nm-  n  im 
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If  we  consider  maximizing  the  likelihood  [  la1  subject  only 

1  1 

to  the  requirement  (ii)  we  see  that  the  maximum  occurs  when 


hence 


~i(VXi-l>  =  Vn'U  311  *■ 


3i(xrxi-i)  =  1/n» 


aL  =  l/n(X1-Xi_1) 


The  collection  of  a^'s  which  maximizes  f  \a^  subject  to  the 
constraints  (i)  and  (ii)  may  be  obtained  from  the  sequence  “a  by 
a  direct  application  of  [1]: 


a,  *  n  max  min  (v-u)[X  -X  ] 
i  .  .  ,  v  u 

v>i  u<i-l 


;  ,  x  1  .  v-u 

f  (y)  =  r  max  min  - — — 

n  n  ,  ,  a  “A 

v>i  u<i-l  v  u 


,  xi_!  <  y  ixi- 


Consistency  of  the  M.L.E.  in  G  .  If  { f n ( * ) }  is  a  sequence  of 

densities  in  £m ,  then  by  the  Helly  weak  compactness  theorem  there  exists 

a  convergent  subsequence,  call  it  f  (•)  which  converges  to  a  density 

n 

f ( • )  in  Qm.  If  we  could  show  that  this  implies  convergence  of  the 

subsequence  F  (•)  of  to  the  point  F(»)  of  Qm ,  then  we  would 

m 

n 

have  shown  that  Q  is  compact.  Take  x  fixed;  it  suffices  to  show 


r x  rx 

lim  /  f  (y>dy  =  /  lim  f  (y)dy. 
n  J  r\  in  J  p.  n  m 


'0  n 
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I 


This  interchange  follows  from  the  bounded  convergence  theorem, 
since  any  f(*)  in  is  less  than  or  equal  to 


m 

0  <  y  <  1/m 

i/y 

y  >  1/m 

and  h(*)  is  integrable  on  (0,x). 

Condition  1  is  automatic;  y  =  Lebesgue  measure. 

Condition  2.  Let  { g^ ( * )  }  be  the  densities  in  Qm  which  are 
"rational  step  functions",  i.e.,  step  functions  with  jumps  at  the 
rationals  and  rational  values. 

Condition  3.  g.g  s(x,v)  £g  m,  which  is  integrable  with  respect 
to  F( •) . 

Thus,  consistency  follows  from  the  theorem  on  consistency  of  the 
M.L.E. 

Example  2.  (c.f.  [9]).  Let  5fm  be  the  class  of  distribution 

functions  which  passes  a  Radon-Nikodymn  derivative  f(")  which  is 
bounded  above  by  m  and  is  unimodal,  i.e.,  there  exists  some  fixed 
t^  such  that  if  ti  >  C2  51  Cf  or  Cf  >  C2  >  Cl*  then  f  (t ^ ^  f  ( t ^ ) . 
(Any  such  t^  will  be  referred  to  as  a  turning  point.) 

As  in  Example  1,  consistency  of  the  M.L.E.  in  the  class 

3f  =  U  follows  from  consistency  of  the  M.L.E.  in  Sfm. 

m<+°° 

Computing  the  M.L.E.  in  3lfm.  As  before,  to  maximize  the  likelihood 
at  the  observations  it  is  necessary  to  make  the  M.L.E.  of  the  density  as 
small  as  possible  between  observations  subject  to  the  condition  that  it 
be  unimodal.  Thus  we  can  make  the  following  statements  about  the  M.L.E.; 
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Some  observation  will  be  a  turning  point.  At  this 

observation  the  M.L.E.  f^C*)  will  have  the  maximum  permissible  value 
m,  since  we  may  take  f^C*)  to  be  discontinuous  at  this  X^  . 

To  the  right  of  X^ ,  f^C*)  will  be  a  left  continuous  step  function 

and  will  be  zero  to  the  right  of  X  . 

n 

To  the  left  of  X^ ,  f^C*)  will  be  a  right  continuous  step  function 
and  will  be  zero  to  the  left  of  x^. 

If  we  knew  the  turning  point  X^  and  knew  that  f^C*)  had  mass  6 > 
0  £  6  <_  1,  to  the  left  of  X_,  ,  then  we  could  proceed  as  before: 


y<Y  Vy)  'A  ”,1?1  ma*x7x-  - 

j  V>1+1  U<1  j  u 


y  >  V  fn  (y)  =  mi"  X^T  5  Xi  <  y  ixi+r 

J  j  J  v>_i+l  u<i  v  u 


Keeping  X^  fixed  and  writing  the  likelihood  as  a  function  6: 

n  , 

TT  f  «.)  -  6J  (1-6)  J  •  *(X, . X  ) 

1-1  nj  1  in 

where  is  independent  of  6.  Thus  the  choice  of  6  which 


maximizes  the  likelihood  based  on  the  assumption  that  X^  is  the 

turning  point  is  6  =  j-l/n-1.  Hence  the  M.L.E.  fn(*)  is  equal  to 

that  f  (•)  for  which  the  likelihood  (using  the  optimal  choice  of  6)  is 
n . 

J 

a  maximum  over  all  choices  of  j . 


i 


f 


Consistency  of  the  M.L.E.  in  3fm.  Again,  by  considerations 

similar  to  the  Helly  weak  compactness  theorem,  we  may  show  that  the 

collection  of  unimodal  densities  of  3(m  is  weakly  compact.  (Recall 

that  Sfm  will  contain  other  than  proper  distribution  functions.) 

What  remains  is  to  show  that  if  Fn(‘),  F(«)  are  in  Sfm,  and 

f  (x)  ->■  f  (x)  at  all  points  of  continuity  of  f(*)>  then  F  (x)  -*■  F(x). 
n  n 

Let  t  be  a  turning  point  of  f  (♦).  then  if  necessary  we 
n  n 

may  take  a  subsequence  t  which  converges  to  some  point  t^, 

m 

-<*>  <_  tg  <  +oo.  For  all  large  n,  t^  c  (t^-l.t^+l),  thus  as  before,  for 
sufficiently  large  n,  f^Cy)  ^h(y),  where 

h(y)  =  m,  tQ  -  1  -  1/m  <  y  <  tfl  +  1  +  1/m, 

■  l/y-(tQ+l),  tQ  +  1  +  1/m  <_  y 
=  l/(t0-l)-y,  y  <  tQ  -  1  -  1/m. 

Hence  for  fixed,  finite  a,  b: 

b  b 

lim^  /  fn(y)dy  =  /  limn  fn(y)dy, 

a  Ja 

lim  F  (b)  -  F  (a)  =  F(b)  -  F(a). 
n  n  n 

But  every  element  of  is  the  integral  of  a  unimodal  density 

and  therefore  F  (-<»)  =  0,  F(-»)  =  0,  nence  F  (b)  -*•  F(b)  for  all  b, 

n  n 

as  was  needed  to  show  5fm  compact. 

The  conditions  1,  2  and  3  are  clearly  satisfied;  y  =  Lebesque 

measure;  {g  (•)}  may  be  taken  as  "rational  step  functions",  and 
m 

£g  s(x,H)  <  £g  m,  hence  consistency  follows  from  Theorem  1. 
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Example  3.  Let  J7m  (m  <  +«)  denote  the  class  of  distribution 
functions  F(*)  with  the  property  that  -j£g(l-F(*))  has  a  Radon- 
Nykodymn  derivative  r(*)  which  is  bounded  above  by  m,  and  -r(*) 
is  unimodal;  i.e.,  there  exists  tg,  0  <_  tg  _<+«>,  such  that  if 

tQ  —  ti  1.  t2  or  t2  —  C1  -  t0’  then  r 1  (Any  such 

tg  will  be  referred  to  as  a  turning  point.) 

Let  the  class  9  =  9  be  the  class  of  distribution  functions 
having  unimodal  hazard  rates. 

Computing  the  M.L.E,  in  9™.  (c.f.  [2]). 

As  in  other  examples  of  hazard  rate  estimation,  the  unrestrained 

M.L.E.  has  a  hazard  rate  which  is  a  step  function,  in  this  case  it  is  of 

the  form 

r*(.)  =  (n-j) (Xj+1-Xj)  ,  t  e  (Xj,Xj+1). 

The  likelihood  is  a  monotone  function  of  these  values  and  subject  to  the 
bounds  m,  e,  it  is  maximizing  by  setting 

iMt)  =  (n-j  )  (Xj+j-Xj  )  1  A  m,  t  e  (x^  ,Xj+] )  . 

As  before,  if  we  knew  the  turning  point  tg,  it  would  follow  that  the 

M.L.E.  having  unimodal  hazard  rate  is  just  the  decreasing  "Brunkized" 

version  of  the  (t)  to  the  left  of  tg  and  the  increasing  "Brunkized" 

version  of  the  r*  (t)  to  the  right  of  t„.  Hence  the  M.L.E.  in  P’m,c 

n  U 

can  be  obtained  by  maximizing  this  expression  over  all  choices  of  tg. 

It  follows  from  the  form  of  the  expression  (c.f.  [8])  that  the 

estimate  in  the  class  9™  is  equal  to  (r  (t)  A  m)  where  r  (t)  is 

n  n 

the  hazard  rate  of  the  estimate  in  9- 
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Consistency  of  the  M.L.E,  in  £7m 


First  we  show  that  is  compact.  Take 

00  oo 

n=l,2,...  having  hazard  rates  {r  (•)}  and  turning  points  {t  } 

n  .  n  , 

n=l®  n-1 

Then  there  is  a  subsequence,  call  it  { t  }  such  that  t  t,. 

m  ,  m  0 

m=l 

0  t^  £  +oo.  it  follows  from  considerations  similar  to  the  Helly-Bray 

00  00 

lemma  that  there  is  a  subsequence  of  { r  ( * ) }  ,  call  it  { r  ( * ) }  , 

m=l  £=1 

which  converges  to  some  nonincreasing  function  on  [0,tg).  Similarly, 

00  00 

there  is  a  subsequence  (call  it  -f  r,  ( • ) }  )  of  {r  (•))  which 

k=l  1  £  =1 

converges  to  some  nondecreasing  function  on  [tQ,+<*).  It  follows  that 

00 

{r, (•)}  converges  to  a  unimodal  hazard  function  which  is  bounded 
k  k=l 

above  by  M;  hence,  (by  the  bounded  convergence  theorem)  we  have  found 
a  convergent  subsequence  of  G^O)  whose  limit  is  in  £7m,e,  as  was 
to  be  shown. 


Again  the  conditions  1,  2,  and  3  are  immediate;  p  :  Lebesque 

measure;  {g  }  may  be  taken  as  densities  with  hazard  rates  which  are 
m 

unimodal  "rational  step  functions",  and  £g  s(x,v)  <_  £g  m. 

Thus  consistency  follows  from  the  theorem  on  consistency  of  the  M.L.E. 

Other  examples.  In  the  manner  of  examples  1-3  we  could  also  show 
consistency  for  the  class  of  distributions  having  increasing  (or 
decreasing)  hazard  rates  bounded  above  by  m.  As  in  example  1, 
consistency  of  these  classes  without  the  bound  follows  easily. 

(c.f.  [8],  [5]). 


« 


i  * 
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Following  the  lines  of  example  3  we  could  consistently  estimate 
a  bimodal  density  and  compute  the  estimate  in  a  similar  manner.  It 
should  be  mentioned  that  in  [9]  Rao  gives  (as  the  solution  of  a  certain 
heat  equation)  the  asymptotic  distribution  of  the  M.L.E.  of  a  unimodal 
density  with  known  mode. 

Similarly  one  could  prove  the  consistency  of  the  M.L.E.  of  a 
distribution  having  convex  hazard  rate.  An  algorithm  for  computing 
such  an  estimate  has  been  found  in  [A]. 

In  the  classical  parametric  estimation  problems  it  suffices 
(in  all  those  examples  we  have  considered)  to  take  a  compactification 
of  the  parameter  space  and  a  countable  dense  subset  of  this  space. 

The  troublesome  condition  3  is  the  same  as  one  of  Wald's  hypotheses 

[10]. 
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